Picture for Hao Peng

Hao Peng

Beihang University

GLM-5: from Vibe Coding to Agentic Engineering

Add code
Feb 17, 2026
Viaarxiv icon

Kelix Technical Report

Add code
Feb 12, 2026
Viaarxiv icon

Dialogue Model Optimization via Agent Game and Adaptive Tree-based GRPO

Add code
Feb 09, 2026
Viaarxiv icon

WildReward: Learning Reward Models from In-the-Wild Human Interactions

Add code
Feb 09, 2026
Viaarxiv icon

Low-Light Video Enhancement with An Effective Spatial-Temporal Decomposition Paradigm

Add code
Feb 09, 2026
Viaarxiv icon

Do We Need Adam? Surprisingly Strong and Sparse Reinforcement Learning with SGD in LLMs

Add code
Feb 07, 2026
Viaarxiv icon

ReBeCA: Unveiling Interpretable Behavior Hierarchy behind the Iterative Self-Reflection of Language Models with Causal Analysis

Add code
Feb 06, 2026
Viaarxiv icon

Faithful Bi-Directional Model Steering via Distribution Matching and Distributed Interchange Interventions

Add code
Feb 05, 2026
Viaarxiv icon

Good SFT Optimizes for SFT, Better SFT Prepares for Reinforcement Learning

Add code
Feb 01, 2026
Viaarxiv icon

On the Paradoxical Interference between Instruction-Following and Task Solving

Add code
Jan 29, 2026
Viaarxiv icon